Robust Error Correction for De Novo Assembly via Spectral Partitioning and Sequence Alignment
نویسندگان
چکیده
Error correction is the first step for any de novo assembly using next generation sequencing (NGS) data. This task is quite difficult and most available error correction software only supports base mismatches. In this work we propose a novel approach based on spectral graph clustering and Smith-Waterman alignment. This approach not only supports insertions and deletions, but also do not make any assumptions about the sequenced data.
منابع مشابه
Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data
MOTIVATION Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. RESULTS We prese...
متن کاملNxRepair: error correction in de novo sequence assembly using Nextera mate pairs
Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identif...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملMinimap and miniasm: fast mapping and de novo assembly for noisy long sequences
MOTIVATION Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. RESULTS We present a new map...
متن کاملPepTiger: Search Engine for Error-Tolerant Protein Identification from de Novo Sequences
In recent years a number of de novo sequencing software products became available providing possible partial or complete amino acid sequence tags for MS/MS spectra of peptides. However, for a variety of reasons including spectral chemical noise and imperfect fragmentation these sequence tags almost always contain errors. Additional difficulties arise from actual protein sequence variation and p...
متن کامل